Goto

Collaborating Authors

 minimax risk


Locally Optimal Private Sampling: Beyond the Global Minimax

Neural Information Processing Systems

We study the problem of sampling from a distribution under local differential privacy (LDP). Given a private distribution P P, the goal is to generate a single sample from a distribution that remains close to P in f-divergence while satisfying the constraints of LDP.


Understanding Generalization in Physics Informed Models through Affine Variety Dimensions

Neural Information Processing Systems

Physics-informed machine learning is gaining significant traction for enhancing statistical performance and sample efficiency through the integration of physical knowledge. However, current theoretical analyses often presume complete prior knowledge in non-hybrid settings, overlooking the crucial integration of observational data, and are frequently limited to linear systems, unlike the prevalent nonlinear nature of many real-world applications. To address these limitations, we introduce a unified residual form that unifies collocation and variational methods, enabling the incorporation of incomplete and complex physical constraints in hybrid learning settings. Within this formulation, we establish that the generalization performance of physics-informed regression in such hybrid settings is governed by the dimension of the affine variety associated with the physical constraint, rather than by the number of parameters. This enables a unified analysis that is applicable to both linear and nonlinear equations. We also present a method to approximate this dimension and provide experimental validation of our theoretical findings.


Blind Attacks on Machine Learners

Neural Information Processing Systems

The importance of studying the robustness of learners to malicious data is well established. While much work has been done establishing both robust estimators and effective data injection attacks when the attacker is omniscient, the ability of an attacker to provably harm learning while having access to little information is largely unstudied. We study the potential of a "blind attacker" to provably limit a learner's performance by data injection attack without observing the learner's training set or any parameter of the distribution from which it is drawn. We provide examples of simple yet effective attacks in two settings: firstly, where an "informed learner" knows the strategy chosen by the attacker, and secondly, where a "blind learner" knows only the proportion of malicious data and some family to which the malicious distribution chosen by the attacker belongs. For each attack, we analyze minimax rates of convergence and establish lower bounds on the learner's minimax risk, exhibiting limits on a learner's ability to learn under data injection attack even when the attacker is "blind".


Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers

Neural Information Processing Systems

We consider the problem of estimating a function defined over nlocations on a d-dimensional grid (having all side lengths equal to n1/d). When the function is constrained to have discrete total variation bounded by Cn, we derive the minimax optimal (squared) `2 estimation error rate, parametrized by n,Cn. Total variation denoising, also known as the fused lasso, is seen to be rate optimal. Several simpler estimators exist, such as Laplacian smoothing and Laplacian eigenmaps. A natural question is: can these simpler estimators perform just as well?


Information-theoretic Limits of Online Classification with Noisy Labels

Neural Information Processing Systems

We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by stochastic noise, and the features are generated adversarially. Predictions are made using observed labels and noiseless features, while the performance is measured via minimax risk when comparing against labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is characterized (up to a logarithmic factor of the hypothesis class size) by the of the noisy label distributions induced by the kernel, of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new version of Le Cam-Birgรฉ testing suitable for online settings. Our work provides the first comprehensive characterization of noisy online classification with guarantees that apply to the while addressing noisy observations.






Towards Sharp Minimax Risk Bounds for Operator Learning

arXiv.org Machine Learning

A new paradigm in machine learning for scientific computing is focused on designing learning algorithms and methods for continuum problems. This paradigm is referred to as operator learning and has received considerable interest in the last few years [5,7,18,20,23-25,27,30,34,36]. The basic task may be posed as learning a map between infinite-dimensional function spaces, i.e., learning an operator F: X Y, where, for example, X and Y are real, separable Hilbert spaces. Operator learning naturally arises in many scientific problems where one wants to learn how a continuum model, often described by partial differential equations (PDEs), maps inputs, such as parameters or boundary conditions, to outputs, such as states or observables. A prototypical example to keep in mind is learning parameter-to-solution maps of parametric PDEs [1,2,11]. In contrast to more classical surrogate modeling, which typically focuses on learning finite-dimensional parameter-to-solution maps for some fixed discretization, operator learning directly aims to learn/approximate the continuum map F: X Y itself. Thus, the inputs and outputs are functions (not vectors) and the goal is to directly design discretization-invariant methods [7,23]. From a statistical perspective, this naturally leads to a nonparametric regression problem in which both the object of interest (the operator) and the observations (finite number of noisy samples) are infinite-dimensional.